document 



RESUME 



TE 000 510 



ED 022 751 

TEST WHAT DOES IT TELL A TEACICR? 

^ read .t the Acwiual Meeting of the Nafonal Co«nc4 on Measurement ,n EcAKatKX. (Chcago. 

Av^a^Vom- National Counc4 of Measurement in Education. Office of Evaluation Services. Michigan State 
Univ, East Lansing. MicK (Single copy S250). 

Journal Cit- Journo of Educational Measurement; v3 nl pl9-25 Spring 1966 
TESTW^TEST INTERPRETATION. *TEST VALIDITY. VERBAL 

I<fentifiers‘«Seauenti^ Tests of Educational Pi-ogress Writing T. STEP Wntmg Test 

The^^y of the multiple-choice Sequential Tests 
(STEP) Whting /est (1957) was tested by the score 

(k)operatii«Stwdy of Instruction. Seven 

essM assiovnents were used to determine the relationship between STEP and act*« 
writiM beflUSTof the four ob|ectives of the STEP 

4«llaM con.blio' bel-^r^lEP and 1|^*l'S7.S'SdSS".Jd 

RS?^e Tte^i^is md^ted that (1) the ab*ty to produce good writir^ 
aoi^s onlv moderately related to the aWity to manipulate previously gi^ 
on^TEP and (2) the total writing score from STEP does not relate strongly to ^y o 
S;S*3?ssa7ei^tmg^tena but does agree moderat^ 
score- The results must be qualified, however, since no measures of paralel to o 
score reliabiity were avalable. (LH) 



o 



f 



"PERMISSION TO REPRODUCE THIS COPYRIGH1EO 
MATERIAL BY MICROFICIC ONLY HAS BEEN^(*AlflEp 

TO aiC AND ORGANI^TIONS OPERATING UNDER 

agreements with the u. s. office of education. 

FURTHER REPRODUCTION OUTSIDE THE ERIC SYSTEM 
REQUIRES PERMISSION OF THE COPYRIGHT OWNER." 



U S. DEPinwilT OF HUITH. EOUCAIKNI I WflFME 
OFFICE OF EOUUTKM 



IMS OOCUNIIT HAS KEI KPMOUCEO EIACTIY IS KCEIVED FMM THE 
PEISMOIOKAMZATIMOIIfiMinMIT. POWTS OF MEW 01 OPIMOIS 
STITB DO lOT HCESUniY KPKSENT OFFKUl OFFICE OF EOIKITNM 



JOUBNAL OF EDUCATIONAL HEA8UBEIIENT POSHHM 01 POUCY. 



V<n.UllE S. NO. 1 
8PBING. 1MB 



ZEROING IN ON THE STEP WRITING TEST: 
WHAT DOES rr TELL A TEACHER?^ 



Gechige F. Madaus and Robert M. RiPPEY 

Center for the Cooperative Study of Lntructkm 
The Univeisity of OiicABO 



OJ 

Ai 



Jj 




o 

o 

o 



Vi 

N 



A largo numbCTr of schools administCT the STEP Wnting Test (1957). Ahhoug)i 
the publisher of STEP has clearly stated the olives whidi STEP was designed 
to measure, the question frequently arises from classroom teachers, just what can 
you tdl about a student’s actual writing behavior from the results of a multiple 

chmee test 

The wiawwal for intmpreting semes claims that STEP measures ability to t hink 
mticalty in writing, to organize matoials, to write matoial appropriate for a 

purpose, to write effeclivdy, and to observe conventional in punctuation and 

grammar (p. 7). The manual further states that, “The STEP Writing tests sedc 
to measure conqueheo^dy the full range of stalls involved in die process of 
good writing.*’ 

itwna on the STEP arc classified according to five cat^ones: 1. or gmiTati on, 
2. conventions, 3. critical thinking, 4. effectiveness, 5. appropriateness. 

pianv (Boros, 1959, p. 593-4) asserts that STEP fails in all but the seomd cat^ 
gory and i& only partially successful in measuring conventions. He arrived at his 
from his analysis of item content Perhaps revealing his own Wases 
multiple dioice writing tests he concludes “any educa tor w ho wishes 
to measure the fuU range of skills involved in the process of good writing wffl re- 
sort to writing itseH” 

Hieronymons (Buros, 1959, p. 595) concludes, again from Im analyw of item 
content, that STEP measures “very effectively higher-order writing skills, partic- 
niaHy those of effiectiv«iess and appropriateness. 

Hic publidiers report corrdations between STEP Writing ^ English grades. 
The difficulty in mwig sndi grades as criteria of writing ability lies in the large 
number of heterogeneous activities subsumed under an Eng^ grade. This has 
bear perinted out by die Rqiort of die Commission on Engilirii (CoU^ Entrance 
Examination Board, 1965). 

ADen, (Buros, 1959, p. 596-7) disturbed by the STEP Wii^ item content 
and lade of statistical evidence of validity urges diat STEP Writing be compared 
with odm measures of ac tu a l writing. 

icad at the Animal Meeting of the National Council on Measurement in Education, 
Chicago, Illinois, February, 1966. 



19 



O 



JOUKNAL OF EDUCATIONAL MEASUSEMENT 



VOL. 3. NO. 1 



This paper will describe an exploratory validity study of this type. 

During the past two years, the Cmter for the Cooperative Study of Instruction 
has been conducting an experiment in the teaching of writing. This experiment 
has resulted in the evaluation of students using both the STEP Writing Test 
Level I and a set of seven criterion referenced scales, developed for the purpose 
of helping graders identify the incidence of specific writing behaviors on the part of 
students (Ripp^, 1965). 

These scales were developed by the experienced high school English teachers 
participating in tiie project. Th^ first chose seven skills th^ thought minimally 
necessary for good writing and thm listed behaviors they would accq>t as evidence 
of these characteristics. The seven were (1) Punctuatimi, (2) Usage, (3) Sense of 
Audience and Purpose, (4) Organization, (S) Use of Detail, (6) Attitude Toward 
Writing, and (7) A Good Topic and Concluding Sentence. The scales with their 
behaviors are found in ^ipaidix 1. 

The studmts were givoi four writing assignments at the beginning of the experi- 
ment and the same four assignments seven months later. 

Each assignment was structured to evoke samples of writing which then could 
be graded according to one or more of the seven criterion scales. 

The first assignment. Why Read was given after all the students had read pass- 
ages on the importance of reading Milton, Thoreau, Cervantes, Wolfe, Salingn, 
and Faulkner. Th^ were directed to present a convincing argument about tiie 
importance of reading to other members of the class. This assignment was scored 
using the Punctuation, Usage, Sense of Audience and Purpose, and Effective Or- 
ganization of the Paragraph scales. 

A paragiaph on **the most interesting poson I have known” was scored using 
the Use of Detail Critoia. A paragraph on **evaything you have written in the 
past year” was scored using the Attitude Toward Writing criteria. A final para- 
grs^h on any topic the student wished was scored using the Good Topic and Con- 
cluding Soitence critoia. 

Thus seven scores were obtained finm the structured writing assignments on 
a set of subordinate behaviors, which judiciously applied should result in inqnoved 
writing on the part of tiie student 

Four of the stated objectives of STEP Writing 2 tppear to be congurent whh 
four of the critaion scales used to score the writing assignmmits, namdy Punctu- 
ation, Usa^, Sense d Audience and Purpose, and Effective Organization of the 
Paragraph. We thodore felt that the validify of these four STEP Writing objec- 
tives should be tested using the essay scores as criteria. 

Procedure and Results 

Carefully trained indqimdent reados applied the rating scales to the psqiers 
written by a random sample of one-third of the total number of students partici- 
pating in the eqierimenL 



20 



SPRING 1966 



GEORGE F. MADAUS AND ROBERT M. BIPPEY 



The freshman, representing the entire experimental population of one school, 
along with any students who did not have a complete set of scores, wwe removed 
for purposes of this study. The remaining group of 101 sophomores, juniors, and 
seniors drawn from two school were used in the analyses which follow. Only the 
post-test data from the experiment were used. 

Correlations among the scores assigned by two graders on each of the seven var- 
iables are contained in Table 1. 

No measures of parallel form or score reliability were available. The table also 
shows the inter correlation among the criterion derived essay scores. 

TABLE 1 



Intercorrelations Between The Seven Post-test Essay Scores on the Criterion 
Reference Scales. (Inter-rater Correlations Appear in Diagonal) 



Test 


1 


2 


3 


4 


5 


6 


7 


1. Punctuatiem 


.72 














2. Usage 


.37 


.80 












3. Sense of Audience 
















& purpose 


—.18 


.04 


.72 










4. Organization 


.05 


.23 


.37 


.76 








5. Use of DetaU 


—.09 


.10 


.18 


.22 


.70 






6. Attitude 


.19 


.17 


.15 


.33 


.32 


.72 




7. Topic & Concluding 


.18 


.06 


—.08 


.16 


.40 


.28 


.88 


Criterion Score 


.23 


.42 


.16 


.32 


.05 


.40 


.25 



Sght of the 21 correlations are significantly different from zero at the .05 level 
or beyond. None of these significant correlations exceed .40. 

The administration of the STEP Writing Test in the spring (fid not adhere to 
standardizatiem procedures in that 55 instead of 70 actual testing minute were 
allowed. However, the mean converted score for all students was 289 which rep- 
resents a school mean percentile of 96 for grade 1 1 and 63 for grade 12 accord- 
ing to the STEP school mean norms for fall testing. These figures suggest th^ 
the test was neither too difficult nor the time alloted too short for our sample. This 
time Hiff#OT>ntiai must be borne in mind, however, when interpreting our results. 

To de tepwinp- the maximum possible variance shared by STEP and the seven 
critrion derived essay scores, a multiple regression was performed. This resulted 
in an R of .58. Thus 34% of the variance of STEP is accounted for by the seven 

essay scores. 

The validity of the STEP objectives for this sample using the essay scores as 
criteria can be judged from the final row of coeffiaents in Table 1. 

There is a statisticaly significant but small correlation of .23 between STEP 
and the punctuation score. STEP docs not have a statistically significant relation- 



21 



JOUENAL OF EDUCATIONAL MEASUREMENT 



VOL. 3. NO. 1 



ship with our Sei^ of Audience and Purpose variable. STEP has statistically 
significant but moderate correlations of .42 and .32 between Usage, and Ability 
to Organize as measured by our criterion scales. 

Discussion 

From the above analyses several points seem to emerge. First, abilities to ac- 
tually produce the component behaviors related to good writing are at best, for 
this sample, only moderately related to the ability to revise, rearran^, jud^, or 
choose previously given information as on the STEP. The reason for lack of 
stronger agreement may be due to something closely akin to what the Report of 
the Commission on En^h points out. (Colley Entrance Examination Board, 
1965, p. 80). 

“It is not just that analysis is different from synthesis, or that learning 
how to see and understand is different from learning how to show and 
to communicate. The difference goes deeper, to the very quick of the 
student’s life, where, like any writer, he exposes himself to public scrutiny, 
lays his mind bare or all to see.” 

Secondly, the total writing score of the STEP does not seem to be related 
strongly to any of the individually important writing behaviors our English 
teachers felt minimally necessary for good composition. When the seven essay 
scores are used together they agree moderately well with this writing score given tty 
the STEP. 

An alternative interpretation of these data also suggests itself. Since the score 
or parallel form reliability is unknown and since it L probably less than inter- 
rater reliability (Gulliksen, p. 212) the actual relationship between STEP Writ- 
ing and actual writing behavior may be higher than the relationships obtained in 
the above analyses (Coffinan, 1966). If so, these obtained correlations may be 
suggestive of a higher relationship between STEP Writing and actual writing be- 
havior than can be inferred from the relationships obtained. 

For this sample the two essay scores which correlated most hi^y with the 
STEP score were the Usage and Attitude Toward Writing scores. The moderate 
correlation of .40 between STEP and the Attitude Toward Writing variable is 
interesting. The attitude index gives high scores to those students who report, 
on their own initiative, a lar^ amount of self-directed, broadly oriented writing 
which turned out to be pleasant and useful to them. This is, of course, a good 
description of the student who has been successful in writing up to this time. 

We suspect that if teachers were to rank students in order of their writing 
ability that the ranlmgs would correlate more highly with STEP than with our 
rating scales. Our scales do not claim to measure global writing ability or to be 
a comprehensive list of necessary ingredients of good writing; style, fictional and 
narrative techniques, spelling, and substance are obviously missing. 

This study did not attempt to validate this global aspect of STEP but only at- 
tempted to see how STEP compared with seven limited aspects of actual writing 
behavior. 



22 



I 



SPRING 1M6 GEORGE F. MADAUS ANI. ROBERT M. MPPEY 

STEP gives a teacher information on how a student stands in relation to a 
norm group on some kind of global writing score. It did not teU our teachers or 
students specificaUy what the examinees could or could not do in wnting a comj^ 
sition. It did not offer specific directions for improving writing. This is the 
conclusion we must draw from these data. 

We would recommend that STEP writing be modified to give in addition to an 
overall score, sub-scores on the various writing behaviors STEP daims it is measur- 
ing. We would further suggest that these scores be validated against the actual 
writing behaviors and not against composite writing performances. 

Furthermore, if possible, the scores should be referenced to actual wnting 
havior as weU as to a norm group. This may well involve two separate scores with 
different interpretations and implications. 

References 

Buros, O. K. Sixth mental measurement Yearbook, Highland Park, New Jersey: 
The Gryphon Press, 1959. 

Coffman, W. E. On the validity of essay tests of achievement. Journal of Edu- 
cational Measurement (in press, 1966). 

College Entrance Examination Board. Freedom and DiscipUne in English, 
Report of the Commission on English, New York: College Entrance Exam- 
ination Board, 1965. 

Educational Testing Service. Sequential tests of emotional progress 

for interpreting scores: Writing. Princeton: Educational Testing Service, 1957. 

Gullksen, H. Theory of mental tests. New York: John Wiley and Sons, 1950. 

Rippey, R. M., a criterion referenced test in English composition, Chiw^: The 
Center for the Cooperative Study of Instruction, The Umversity of Chicago, 

1965. (Mimco.) 



er|c 



23 



JOURNAL OF EDUCATIONAL MEASUREMENT 
VOLUME 3> NO. . 

SPRING. 19M 



APPENDIX 1 

ENGLISH COMPOSITIONS 
GRADING SCALES 

Variable 1, Punctuation. 

Suggested Point Value 

1 Does the subject use capital letters correctly? 

1 Does the subject terminate his sentences properly? 

1 Does the subject employ possessives or contractions, and are they punctuated 
correctly? 

1 Are commas used correctly? 

1 Are colons or semicolons used correctly? 

1 Does the student employ quotations, and use the quotation marks correctly? 

1 If the student uses quotations, are the commas, question marks and periods 
associated with the quotations placed correctly? 

1 If the subject makes no more than a sin^e error in any of the above categories, 

add 1 additional point. 

1 Does the subject use italics, ellipses, exclamation marks, special indentations, 
or other miscellaneous punctuations correctly? 

Variable 2. Usage 
Suggested Pant Value 

1 Does the subject use incomplete sentences? 

1 Does the subject use run-on sentences? 

1 Does the subject make errors jn agreement of subject and verb? 

1 Does the subject make errors of pronoun reference? 

1 Does the subject misuse any words? 

1 Does the subject write any meanin^ess sentences? 

1 Does the subject write any sentences which are obviously awkward? 

1 If the subject made no more than a single error in any of the above categories 

give one extra point. 

1 Is the paper free from any miscellaneous errors of usage not covered by the first 
seven categories? 

Variable 3. Sense of Audience and Purpose: Sense of purpose and sense of audi- 
ence are related. In judging purpose, the following questions might be asked. 

Suggested Point Values 

1 Can you, the reader, state the purpose of the author? 

1 Was it difficult for you to ascertain just what the purpose was? 

1 Did the writing deal consistently with a sin^e purpose? 

1 Did the writing contribute to the purpose you identified? 

1 Would the writing be likely to move the intended audience in the direction 
intended by the author? 

1 Was the lan g ua ge and vocabulary suited for the target audience? 

1 Could you identify the audience from reading the paper, or was the paper writ- 
ten in a bland “teacher-pleasing” style? 



24 



JOURNAL OF EDUCATIONAL MEASUREMENT 



VOL. 3, NO. 1 



1 Did the paper show evidence of the author’s having thought about the view- 
points of the reader and his biases? 

1 Did the author appear to consider those areas where the reader might have dif- 
ficulty in accepting his argument? 

Variable 4. Effective organization of paragraph. 

Suggested Point Values 

1 Does the author use more than just the simple sentence? 

1 Does the author use both compound and complex sentences? 

1 Does the author use sentences of varying length? (Are some sentences at least 

three times as long as others?) 

1 Does the author use both positive and negative examples? 

2 Does the author arrange his sentences in a logical order such as concrete to 
abstract, simple to complex, familiar to unfamiliar, geographically, chronolo^- 

cally, etc? , , 

2 Does the order and organization of the sentences serve the purpose of the 

paper and help to make it more interesting or easier to understand? 

1 General effect of the paper as a unified whole. 

Variable 5. Selection of Detail to support the purpose of the paragraph. 

Suggested Point Values . 

2 Does the writer use details, or is his paper a jumble of abstractions. 

1 Does the writer use both concrete and specific details? 

2 Are the details relevant to the purpose? 

2 Are the details well chosen and vivid? 

2 Are both physical and psychological details included? 

Variable 6. Attitude toward writing. 

Suggested Point Values 

1 Does the writer indicate that he has done much writing? 

1 Does the writer express favorable attitudes toward writing, or does he suggest 

that it is a waste of time? . . . u u 

1 Does the writer show evidence of having enjoyed the wnting which he h^ d^. 

2 Does the writer indicate that writing has served some useful purpose for hun. 
2 Has the writer written broadly, or are his writing narrow in scope and purp^. 

1 Has the writer written largely in response to assignments or has he done wnting 

1 Haf ftTwriter chosen serious topics to write about, or trivial ones? Paragraphs 
two and three might add to this. 

Variable 7. Good topic and concluding sentence. (5 points on each) 

Suggested Point Value 

2 Can you clearly identify a topic and a concluding sentence? 

2 Is the placement of these sentences appropriate? 

2 Do these sentences have an appropriate impact or effect on the 

1 Dos the topic sentence specify the object of the writing and an attitude about it 

2 Do the topic and concluding sentences serve a real purpose, or do you feel that 

you could do without them? 



